|
In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence. There are a number of methods that have been proposed to do so. There is no consistent way to classify motion segmentation due to its large variation in literature. Depending on the segmentation criterion used in the algorithm it can be broadly classified into the following categories: image difference, statistical methods, wavelets, layering, optical flow and factorization. Moreover depending on the number of views required the algorithms can be two or multi view-based. Rigid motion segmentation has found an increase in its application over the recent past with rise in surveillance and video editing. These algorithms are discussed further. == Introduction to rigid motion == In general, motion can be considered to be a transformation of an object in space and time. If this transformation preserves size and shape of the object it is known as a Rigid Transformation. Rigid transform can be rotational, translational or reflective. We define rigid transformation mathematically as: where F is a rigid transform if and only if it preserves isometry and space orientation. In the sense of motion, rigid transform is the movement of a rigid object in space. As shown in Figure 1: this 3-D motion is the transformation from original co-ordinates (X,Y,Z) to transformed co-ordinates (X',Y',Z') which is a result of rotation and translation captured by rotational matrix R and translational vector T respectively. Hence the transform will be: where, R has 9 unknowns which correspond to the rotational angle with each axis and T has 3 unknowns (Tx,Ty,Tz) which account for translation in X,Y and Z directions respectively. This motion(3-D) in time when captured by a camera(2-D) corresponds to change of pixels in the subsequent frames of the video sequence. This transformation is also known as 2-D rigid body motion or the 2-D Euclidean transformation. It can be written as: where, X→ original pixel co-ordinate. X'→ transformed pixel co-ordinate. R→ orthonormal rotation matrix with R ⋅ RT = I and |R| = 1. t→ translational vector but in the 2D image space. To visualize this let us consider an example of a video sequence of a traffic surveillance camera. It will have moving cars and this movement does not change their shape and size. Moreover the movement is a combination of rotation and transformation of the car in 3D which is reflected in its subsequent video frames. Thus the car is said to have a rigid motion. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Rigid motion segmentation」の詳細全文を読む スポンサード リンク
|